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^ ABSTRACT 

This paper describes the architecture and implementation of a high-speed decompression engine for 
embedded processors. The engine is targeted to processors where embedded programs are stored in 
compressed form, and decompressed at runtime during instruction cache refill. The decompression 
engine uses a unique asynchronous variable decompression rate architecture to process Huffman- 
encoded Instructions. The resulting circuit is significantly smaller than comparable synchronous 
decoders, yet has a higher throughput rate than almost all existing designs. The 0.8 micron layout is 
all full-custom and contains predominantly dynamic domino logic. The top-level control, as well as 
several small state machines, are implemented using asynchronous logic. The design operates 
without a user-supplied clock. Simulations using Lsim show average throughput of 32 bits/45 ns on 
the output side, corresponding to about 480 Mbit/sec on the input side. The chip has been 
manufactured by MOSIS; tests show that the asynchronous implementation operates correctly, with 
an average throughput exceeding simulations: 32 bits/39 ns on the output side, corresponding to 
about 560 Mbit/sec on the input side. This speed is acceptable for our application. The area of the 
design (excluding the pad-frame overhead) is only 0.75~\hbox{mm}^2. The design Is the first 
fabricated chip for an instruction decompression unit for embedded processors. 
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1 Se s s ion 6C: Markovian ana lys is and async h ro n ous c ir c uits: Pipeline opt imiza ti on f or 
asynchronous circuits: complexity analysis and an efficient optimal algorithm 
Sangyun Kim, Peter A. Beerel 

November 2000 Proceedings of the 2000 IEEE/ ACM international conference on 
Computer-aided design 

Full text available: ^ pdf(1 13.08 KB) Additional Information: f u ll cit ation , abs tract , refe r en c e s 

This paper addresses the problem of identifying the minimal pipelining needed in an 
asynchronous circuit (e.g., number/size of pipeline stages/latches required) to satisfy a 
given performance constraint, thereby implicitly minimizing area and power for a given 
performance. In contrast to the somewhat analogous problem of retiming in the 
synchronous domain, we first show that the basic pipeline optimization problem for 
asynchronous circuits is NP-complete. This paper then presents an effic ... 



S t a tistical ly op t im iz ed as y nch r ono u s barrel sh i f t e r s fo r variab l e l en gth c ode cs 
Peter A. Beerel, Sangyun Kim, Pei-Chuan Yeh, Kyeounsoo Kim 
August 1999 Proceedings of the 1999 international symposium on Low power 
electronics and design 

Full text available: ^ pd f(33 7 . 2 3 KB) Additional Information: f ul l c i tation. rMerences, index t erm s 



^ Generati on of fast in te rprete rs f o r H u ff man compr e s s ed b y tec o de | 
Mario Latendresse, Marc Feeley 

June 2003 Proceedings of the 2003 workshop on Interpreters, Virtual Machines and 
Emulators 

Full text available: ^ pdf(323.22 KB) Additional Information: full citation , abstract , references . Index terms 

Embedded systems often have severe memory constraints requiring careful encoding of 
programs. For example, smart cards have on the order of IK of RAM, 16K of non-volatile 
memory, and 24K of ROM. A virtual machine can be an effective approach to obtain compact 
programs but instructions are commonly encoded using one byte for the opcode and 
multiple bytes for the operands, which can be wasteful and thus limit the size of programs 
runnable on embedded systems. Our approach uses canonical Huffman co ... 

Keywords: Java, canonical Huffman code, code compression, decoder 
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4 Optimization: Reducing probabilistic timed petri nets for asynchronous architectural 
an a ly si s 

Sangyun Kim, Sunan Tugsinavisut, Peter Beerel 

December 2002 Proceedings of the 8th ACM/IEEE international workshop on Timing 
issues in the specification and synthesis of digital systems 

Full text available: ^ pdf(541.Q4 KB) Additional Information: full citation , abstract , references , index terms 

This paper introduces structural reductions of probabilistic timed Petri nets that preserve a 
large class of performance measurements. In particular, the paper proposes a class of 
reductions that preserve efficiently computable bounds of statistics of time-separation of 
events (TSEs). It identifies two specific reductions within this class. It demonstrates the 
utility of these reductions by reducing a detailed Petri net describing the four-phase protocol 
of a well-known asynchronous pipeline tern ... 

5 Advan c es i n syn t hes i s: Imp l ement ing asyn c hronous circuit s us ing a con v entional EDA | 
too l- f l ow 

Christos P. Sotiriou 

June 2002 Proceedings of the 39th conference on Design automation 

Full text available- IpNl pdfdO? 01 KB) Additional Information: Ml. ciM^^^^^ abstract, rMerences, citi n gs, index 
" l^^"^ terms 

This paper presents an approach by which asynchronous circuits can be realised with a 
conventional EDA tool flow and conventional standard cell libraries. Based on a gate-level 
asynchronous circuit implementation technique, direct-mapping, and by identifying the delay 
constraints and exploiting certain EDA tool features, this paper demonstrates that a 
conventional EDA tool flow can be used to describe, place, route and timing-verify 
asynchronous circuits. 

Keywords: EDA, asynchronous, tool-flow 



Practical advances in asynchronous design and in asynchronous/synchronous 
int erfac es 

Erik Brunvand, Steven Nowick, Kenneth Yun 

June 1999 Proceedings of the 36th ACi^/IEEE conference on Design automation 

Full text available:^ p.df(155 Additional Information: f u ll c ita tion , re f ere nces, citings, index te 



Unify i ng sy nchron o us/asyn ch ronous sta t e machi n e s ynth esi s 
Kenneth Y. Yun, David L. Dill 

November 1993 Proceedings of the 1993 IEEE/ACM international conference on 
Computer-aided design 

Full text available: ^ pd f(800. 1 9 K B) Additional Information: full citation, refe ren ces , citings 



s Testing redundant asyn c hr onous circ ui ts by var ia ble phase splitting 
Luciano Lavagno, Antonio Lioy, Michael Kishinevsky 

September 1994 Proceedings of the conference on European design automation 

Full text available: ^ p df(700. 2 2 KB) Additional Information: full c i tat ion, re ferences, citings, inde x t e r ms 



9 Algorith ms for synthesis of haza r d - free asy nchronous ci rcuits Q 
L Lavagno, K. Keutzer, A. Sangiovanni-Vincentelli 

http://portal.acm.org/resultsxfhi?coll-ACM&dl-ACM&CFID=25948914&C^ 8/21/04 



Results (page 1): asynchronous huffinan 



Page 3 of 5 



June 1991 Proceedings of the 28th conference on ACM/IEEE design automation 

Full text available: ^ p.df{M8 JIK Additional Information: full ci tat i on , re ference s, citings, ind e x term s 



A unified frame work for race ana l ysis of asynchronous n etworks 
J. A. Brzozowski, C.-J. Seger 

January 1989 Journal of the ACM (JACM), volume 36 issue i 

I- II * * -I ui A A Rjfn.\ Additional Information: full citation, abstract, references, citings, index 

Full text available: I5apdf{2.1 1 MB) ~ - - :' — ' 

terms , review 

A unified framework Is developed for the study of asynchronous circuits of both gate and 
MOS type. A basic network model consisting of a directed graph and a set of vertex 
excitation functions is introduced. A race analysis model, using three values (0, 1, and x), Is 
developed for studying state transitions in the network. It is shown that the results obtained 
using this model are equivalent to those using ternary simulation. It is also proved that the 
set of state variables can be reduced ... 

An e f f i ci e nt c ri tical race-fre e state as signment tech n i q ue f or asynchronous fi n i te state 
machines 

Tam Anh Chu, Narayana Mani, Clement K. C. Leung 

July 1993 Proceedings of the 30th international conference on Design automation 
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Arpad Beszedes, Rudolf Ferenc, Tibor Gylmothy, Andre Dolenc, Konsta Karsisto 
September 2003 ACM Computing Surveys (CSUR), volume 35 issue 3 

Full text available: ^ p d f(4 4 3.89 KB) Additional Information: full citation, abstract, r e f e rences , i n de x te rms 

Program code compression is an emerging research activity that is having an impact in 
several production areas such as networking and embedded systems. This is because the 
reduced-sized code can have a positive impact on network traffic and embedded system 
costs such as memory requirements and power consumption. Although code-size reduction 
is a relatively new research area, numerous publications already exist on it. The methods 
published usually have different motivations and a variety of appll ... 

Keywords: code compaction, code compression, method assessment, method evaluation 



1^ Compressing MIPS code by multiple operand dependencies 
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November 2003 ACM Transactions on Embedded Computing Systems (TECS), volume 2 
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Full text available: ^pdf(576.31 KB) Additional Information: fulLdtMon, abstr a ct, re f ere nces, index t er ms 

Intuitively, destination registers of some instructions have great possibilities to be used as 
the source registers of the immediately subsequent instructions. Such destination 
register/source register pairs have been exploited previously to improve code compression 
ratio [compression ratio = {Dictionary Size + Encoded Program Size)/ Original Program 
Size], This paper further examines the exploitation of both register and immediate operand 
dependencies to improve the c ... 

Keywords: Code compression, benchmarks, data compression, instruction set architecture 
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m Publisher Site fe"^^^ 

This paper presents an automated method for the synthesis of multiple-input-change (MIC) 
asynchronous state machines. Asynchronous state machine design is subtle since, unlike 
synchronous synthesis, logic must be implemented without hazards, and state codes must 
be chosen carefully to avoid critical races. We formulate and solve an optimal hazard-free 
and critical race-free encoding problem for a class of MIC asynchronous state machines 
called burst-mode. Analogous to a paradigm successfully use ... 

Keywords: optimal state assignment, asynchronous state machines, hazards, sequential 
synthesis, sequential optimization 
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In embedded system design, memory has been one of the most restricted resources. 
Reducing program size has been an important goal when designing an embedded system. 
Most of the previous work on code compression has targeted RISC architectures. Recently 
VLIW processors became very popular, particularly for signal processing. Decompression 
speed is especially important for VLIW architectures given that the length of the instruction 
word is long. Furthermore, modern VLIW architectures use flexible ... 
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Simulation of digital logic provides a viable technique for development and diagnosis of 
digital systems. Simulation models currently employed are discussed with a summary of 
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structure and timing techniques. A methodology for functional simulation in conjunction with 
gate level simulation is discussed, presenting a representative set of predefined functions, 
and introducing a measure for predefined function performance. Errors in design detectable 
at the functional level are catagorized. 

Keywords: diagnosis of digital systems, digital simulation, fault simulation, functional 
simulation, logic design 
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Compressing the instructions of an ennbedded program is important for cost-sensitive low- 
power control-oriented embedded computing. A number of compression schemes have been 
proposed to reduce program size. However, the increased instruction density has an 
accompanying performance cost because the instructions must be decompressed before 
execution. In this paper, we investigate the performance penalty of a hardware-managed 
code compression algorithm recently introduced in IBM's PowerPC 405. ... 
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This paper proves that, given a reliable time-bounded arbiter, it is possible to realize a 
reliable (i.e. runt-free) inertial delay, and vice-versa. It therefore shows that the time- 
bounded arbiter and the inertial delay are equally realizable. Consequently all theoretical 
limitations which apply to one will apply, in some form, to the other as well. 
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